17 research outputs found

    A new census of protein tandem repeats and their relationship with intrinsic disorder

    Get PDF
    Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence

    After Recess: Historical Practice, Textual Ambiguity, and Constitutional Adverse Possession

    Get PDF
    The Supreme Court’s interpretation of the Recess Appointments Clause in NLRB v. Noel Canning stands as one of the Supreme Court’s most significant endorsements of the relevance of “historical gloss” to the interpretation of the separation of powers. This Article uses the decision as a vehicle for examining the relationship between interpretive methodology and historical practice, and between historical practice and textual ambiguity. As the Article explains, Noel Canning exemplifies how the constitutional text, perceptions about clarity or ambiguity, and “extra-textual” considerations such as historical practice operate interactively rather than as separate elements of interpretation. The decision also provides a useful entry point into critically analyzing the concept of constitutional “liquidation,” which the majority in Noel Canning seemed to conflate with historical gloss but which seems more consistent with the approach to historical practice reflected in Justice Scalia’s concurrence in the judgment. Finally, this Article argues that the historical gloss approach, when applied cautiously and with sensitivity to the potential concerns raised by Justice Scalia and others, is not vulnerable to the charge of licensing executive aggrandizement by “adverse possession.

    WebSTR: a population-wide database of short tandem repeat variation in humans.

    Get PDF
    Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations. Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset. WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at http://webstr.ucsd.edu

    Alpha-internexin interaction network

    No full text
    <p>Alpha-internexin interaction network downloaded from stringdb. </p

    Bioinformatics notebook for Plot.ly

    No full text
    <p>FANTOM5 provides high precision data of thousands of human and mouse samples. The vastness of this data can be overwhelming and operating it locally is challenging. Luckily, there are many tools out there to make our life easier.<br>For creating a small data subset we can work with in this tutorial, I used TET: Fantom 5 Table Extraction tool. I picked a few human samples, mostly brain tissues with a few outliers, like uterus and downloaded a tab-separated file from the website. For more advanced data extraction, it's good to have a look atTET's API. I have picked normalized tpm(tags per million) and annotated data, so we can focus only on processed data for protein coding genes.</p

    High GC content causes orphan proteins to be intrinsically disordered

    No full text
    <div><p><i>De novo</i> creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population. These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In <i>Drosophila</i> young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and <i>Drosophila</i> (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.</p></div

    Structural properties of proteins of different ages plotted against the GC content of the genome (coding regions).

    No full text
    <p>For clarity only the ancient (blue) and orphan (red) proteins are shown individually, but the linear fitted lines for genus orphans (pink line) and intermediate ones (light blue) are also shown. In the text box three values are presented: rank-sum p-value = p-value of a rank-sum test of orphans versus ancient (only the property on y axis is considered); correlation p-values = p-value of a linear regression test for orphan and ancient.</p

    A simple metric of promoter architecture robustly predicts expression breadth of human genes suggesting that most transcription factors are positive regulators

    No full text
    Background: Conventional wisdom holds that, owing to the dominance of features such as chromatin level control, the expression of a gene cannot be readily predicted from knowledge of promoter architecture. This is reflected, for example, in a weak or absent correlation between promoter divergence and expression divergence between paralogs. However, an inability to predict may reflect an inability to accurately measure or employment of the wrong parameters. Here we address this issue through integration of two exceptional resources: ENCODE data on transcription factor binding and the FANTOM5 high-resolution expression atlas. Results: Consistent with the notion that in eukaryotes most transcription factors are activating, the number of transcription factors binding a promoter is a strong predictor of expression breadth. In addition, evolutionarily young duplicates have fewer transcription factor binders and narrower expression. Nonetheless, we find several binders and cooperative sets that are disproportionately associated with broad expression, indicating that models more complex than simple correlations should hold more predictive power. Indeed, a machine learning approach improves fit to the data compared with a simple correlation. Machine learning could at best moderately predict tissue of expression of tissue specific genes. Conclusions: We find robust evidence that some expression parameters and paralog expression divergence are strongly predictable with knowledge of transcription factor binding repertoire. While some cooperative complexes can be identified, consistent with the notion that most eukaryotic transcription factors are activating, a simple predictor, the number of binding transcription factors found on a promoter, is a robust predictor of expression breadth

    For the 187 considered species, the number of species in which a property is significantly higher (increasing) or significantly lower (decreasing) in orphans compared to ancient proteins is shown.

    No full text
    <p>For the 187 considered species, the number of species in which a property is significantly higher (increasing) or significantly lower (decreasing) in orphans compared to ancient proteins is shown.</p

    Running averages of structural properties computed from amino acid scales against GC content: (a) Intrinsic Disorder Propensity (TOP-IDP); (b) hydrophobicity (Hessa scale); (c,d,e,f) average propensity for secondary structure of type, respectively, turn, coil, beta sheet and alpha helix.

    No full text
    <p>For each property, colored lines represent proteins of different age: orphans (red), genus orphans (pink), intermediate (light blue) and ancient (blue). The black lines represent randomly generated proteins at different GC frequencies.</p
    corecore